HYPERFLEET-1186 - fix: align documentation with codebase and fix pageSize default#211
HYPERFLEET-1186 - fix: align documentation with codebase and fix pageSize default#211sherine-k wants to merge 4 commits into
Conversation
|
Skipping CI for Draft Pull Request. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Central YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (7)
🔗 Linked repositories identifiedCodeRabbit considers these linked repositories for cross-repo context during reviews:
✅ Files skipped from review due to trivial changes (4)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR updates API and operator documentation (endpoints, statuses, spec validation, error/problem-details, pagination default in docs), deployment/config/developer guides (Helm flow, caller-identity resolution, config precedence), database schema and delete lifecycle docs, and reduces service list pagination defaults from 100 to 20 with a matching unit test update. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hello @sherine-k I think this is a wrong ticket number in the title. Can you confirm, please? |
Risk Score: 2 —
|
| Signal | Detail | Points |
|---|---|---|
| PR size | 1335 lines (>500) | +2 |
| Sensitive paths | none | +0 |
| Test coverage | Tests cover changed packages | +0 |
Computed by hyperfleet-risk-scorer
Update README.md, api-resources.md, database.md, and deployment.md to match the current state of the codebase. - Update Go version from 1.24 to 1.25 and PostgreSQL from 13 to 14 - Add generic resource types (WifConfigs, Channels, Versions) and plugins/ to project structure - Remove Ready condition from all JSON response examples (only Reconciled and LastKnownReconciled remain) - Fix adapter_statuses polymorphic columns: owner_type/owner_id to resource_type/resource_id - Fix soft delete column: deleted_at to deleted_time with deleted_by audit field - Replace labels table documentation with resources table documentation - Add Spec Validation section with opt-in qualifier linking to deployment.md - Add Statuses Endpoint vs. Resource Endpoint section - Restructure deployment.md into two chapters: Kubernetes/Helm and Local Execution - Fix pageSize default from 100 to 20 to match TypeSpec contract - Change migration tool reference from GORM to gormigrate - Update license to Apache License 2.0 Co-Authored-By: Claude Opus 4.6
The runtime default was 100 but the TypeSpec API contract defines pageSize as 20. Update the code to match the spec. - Change default Size from 100 to 20 in NewListArguments (types.go) - Update nil-guard defaults in ListByKind and ListByOwner (resource.go) - Update unit test assertion to expect 20 (types_test.go) Co-Authored-By: Claude <noreply@anthropic.com>
b3926de to
d2d1e75
Compare
Document all features implemented between v0.2.0 and v1.0.0 that were missing or partially covered in existing documentation. - Add PATCH, DELETE, and force-delete endpoint sections to api-resources.md - Add RFC 9457 Problem Details error response section with examples - Add delete lifecycle (Active → Finalizing → Hard-Deleted) to database.md - Add descriptor-driven delete policies (restrict/cascade) to database.md - Add caller identity documentation (header vs JWT precedence) to config.md - Add caller identity Helm config examples to deployment.md - Update endpoint listing tables for clusters and nodepools - Update README.md feature list and endpoint summaries for v1.0.0 - Fix incorrect "fails to load configuration" claims in deployment.md - Fix "startup remains non-blocking" claim in openapi/README.md - Fix trailing whitespace and double blank lines in database.md, deployment.md Co-Authored-By: Claude <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
pkg/services/resource.go (1)
275-275: ⚡ Quick winExtract a shared default page-size constant to avoid contract drift.
Line 275 and Line 303 hardcode
20, whilepkg/services/types.goalso defines the same default inNewListArguments. Keep one source of truth (e.g.,DefaultListSize) and reuse it in all fallback initializers to prevent future mismatch across handlers/services/tests.Proposed diff
--- a/pkg/services/types.go +++ b/pkg/services/types.go @@ const MaxListSize = 65500 +const DefaultListSize int64 = 20 @@ listArgs := &ListArguments{ Page: 1, - Size: 20, + Size: DefaultListSize, Search: "", }--- a/pkg/services/resource.go +++ b/pkg/services/resource.go @@ if args == nil { - args = &ListArguments{Page: 1, Size: 20} + args = &ListArguments{Page: 1, Size: DefaultListSize} } @@ if args == nil { - args = &ListArguments{Page: 1, Size: 20} + args = &ListArguments{Page: 1, Size: DefaultListSize} }Also applies to: 303-303
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/services/resource.go` at line 275, There are two places in resource initialization that hardcode Size: 20 (the ListArguments literal at the resource handlers); extract and reuse a single constant (e.g., DefaultListSize) declared alongside NewListArguments in pkg/services/types.go, then replace the literal 20 in the ListArguments initializers (and any other fallback initializers) with DefaultListSize so ListArguments, NewListArguments and the handlers share the same source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/api-resources.md`:
- Around line 331-332: Update the docs text describing cluster deletion to
reflect descriptor-driven, policy-dependent behavior: instead of saying the
soft-delete "cascades the soft-delete to all child nodepools", state that
children are handled conditionally per the deletion policy (see the service
logic in resource deletion): nodepools with required adapters are soft-deleted
(set deleted_time/deleted_by and increment generation and enter Finalizing),
while children without required adapters are hard-deleted immediately; keep the
Finalizing explanation and that hard-delete occurs once adapters report
Finalized=True.
In `@docs/database.md`:
- Around line 62-73: Update the delete-lifecycle docs to remove the unsupported
"adapter quorum race" wording and instead state that adapter finalization is
concurrency-guarded: adapter status processing serializes per-resource via
GetForUpdate and hard-delete only proceeds when adapters report Finalized=True
at the matching generation (ObservedGeneration == resource.Generation) as
checked in tryHardDeleteNodePool and tryHardDeleteCluster; clarify that
child-creation-under-finalizing-parent is already prevented by
ResourceService.Create which acquires the parent row lock (GetForUpdate) and
rejects when parent.DeletedTime != nil (the cluster→nodepools
force-delete/create path likewise rejects deleted parents); add a robustness
note to recursive delete policy that registry.Validate currently does not detect
ParentKind cycles so deleteResourceTree / ChildrenOf must be protected (either
add cycle detection in registry.Validate or guard recursion with a visited set
to avoid unbounded recursion); and explicitly state that restrict enforcement
(ExistsByOwner used in checkCanDelete) runs inside TransactionMiddleware and
relies on the parent row lock for correct isolation.
- Line 72: Update the force-delete documentation to correctly reflect
implemented semantics: state that POST .../force-delete requires the resource to
already be in Finalizing (i.e., deleted_time set) and only bypasses the
“required adapters must report Finalized=True” gating (do not say it bypasses
finalization entirely), document idempotency that repeated POSTs after a
hard-delete return 404 (service maps gorm.ErrRecordNotFound to NotFound), and
clarify cascade behavior that Cluster force-delete hard-deletes all child
NodePools and their adapter statuses while NodePool force-delete only
hard-deletes the NodePool and its adapter statuses; reference
Finalizing/deleted_time, POST .../force-delete, Cluster, and NodePool in the
updated wording.
In `@docs/deployment.md`:
- Around line 496-503: The documentation incorrectly states the command fails
when no config file is found; update docs (including docs/deployment.md) to
describe the actual loader.Load() behavior: when no config file is present the
loader returns nil and logs "No config file found, using environment variables
and flags only", so the application falls back to environment variables and
flags and continues; only treat failures where loader.Load() returns an error as
fatal (as done in the serve command). Reference loader.Load(), the log message
"No config file found, using environment variables and flags only", and
pkg/config/loader.go when making this correction.
In `@README.md`:
- Line 122: The force-delete endpoints lack caller authorization checks; update
the handlers and service layer to enforce RBAC by validating the calling
principal before performing destructive operations: in pkg/handlers/cluster.go
add an authorization check that extracts caller identity/roles from the request
and denies access if not permitted, then pass an authorized context/principal
into pkg/services/cluster.go; in pkg/services/cluster.go verify the principal's
permission prior to calling clusterDao.GetForUpdate and before deleting adapter
statuses/child nodepools; also tighten NodePool logic where GetByIDAndOwner is
used to ensure it validates caller identity/permissions (not just path
ownership) and finally update README.md to document the required RBAC/permission
prerequisites for invoking the force-delete endpoints.
- Around line 92-93: Update README.md to document the trust boundary for config
file paths used by the loader (the --config flag and HYPERFLEET_CONFIG env var)
and explicitly state that these paths must be operator-trusted; note that the
loader only performs an existence check (os.Stat) for explicit sources and then
reads via viper.ReadInConfig, so callers must ensure the file path, permissions
and ownership are secure (suggested mode/owner and disallow world-writable
access). Also change the wording to clarify that schema/content validation is
performed in code (viper.UnmarshalExact + validator.Struct plus custom
TLS/JWT/timeouts/identity checks), and that the documented gap is
path/permission safety rather than configuration content validation.
---
Nitpick comments:
In `@pkg/services/resource.go`:
- Line 275: There are two places in resource initialization that hardcode Size:
20 (the ListArguments literal at the resource handlers); extract and reuse a
single constant (e.g., DefaultListSize) declared alongside NewListArguments in
pkg/services/types.go, then replace the literal 20 in the ListArguments
initializers (and any other fallback initializers) with DefaultListSize so
ListArguments, NewListArguments and the handlers share the same source of truth.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 8dcd780c-beb6-4d3d-b1da-1195c70532ec
📒 Files selected for processing (9)
README.mddocs/api-resources.mddocs/config.mddocs/database.mddocs/deployment.mdopenapi/README.mdpkg/services/resource.gopkg/services/types.gopkg/services/types_test.go
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
openshift-hyperfleet/architecture(manual)openshift-hyperfleet/hyperfleet-api(manual)openshift-hyperfleet/hyperfleet-sentinel(manual)openshift-hyperfleet/hyperfleet-adapter(manual)openshift-hyperfleet/hyperfleet-broker(manual)
| Resources follow a three-phase delete lifecycle: | ||
|
|
||
| ```text | ||
| Active ──(DELETE)──▶ Finalizing ──(adapters report Finalized=True)──▶ Hard-Deleted | ||
| │ | ||
| └──(POST /force-delete)──▶ Hard-Deleted | ||
| ``` | ||
|
|
||
| 1. **Active** — Normal state. Resource is visible in list queries and can be updated. | ||
| 2. **Finalizing** (soft-deleted) — `DELETE` sets `deleted_time` and `deleted_by`, increments `generation`. The resource stays in the database so adapters can observe the deletion and clean up external state. Soft-deleted records are excluded from list queries by default. New child resources cannot be created under a finalizing parent (returns `409 Conflict`). | ||
| 3. **Hard-Deleted** — Permanently removed from the database. This happens automatically when all required adapters report `Finalized=True` at the current generation. If adapters are stuck, `POST .../force-delete` bypasses finalization and hard-deletes immediately. | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify delete lifecycle uses proper transaction isolation
rg -n -C5 'deleted_time|Finalizing' pkg/dao/ pkg/services/ --type=go | rg -C3 'Begin|Transaction|Lock'
# Check for generation increment atomicity
ast-grep --pattern $'func $_(ctx context.Context, id string) $_ {
$$$
deleted_time
$$$
generation
$$$
}'Repository: openshift-hyperfleet/hyperfleet-api
Length of output: 61
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show docs snippet around the referenced lines
echo "---- docs/database.md (around lines 50-90) ----"
nl -ba docs/database.md | sed -n '50,95p'
# Search for referenced lifecycle terms in Go
echo "---- rg lifecycle terms (case-sensitive) ----"
rg -n --hidden --no-ignore-vcs "force-delete|Finalizing|Finalized|deleted_time|deleted_by|generation" . -S -g'*.go' || true
echo "---- rg lifecycle terms in SQL/migrations/docs ----"
rg -n --hidden --no-ignore-vcs "force-delete|Finalizing|Finalized|deleted_time|deleted_by|generation" . -S -g'*.sql' -g'*.md' -g'*.go' -g'*.yaml' -g'*.yml' || true
echo "---- rg delete policies text (restrict/cascade) ----"
rg -n --hidden --no-ignore-vcs "restrict|cascade|force-delete|POST /force-delete|/force-delete" . -S -g'*.go' -g'*.md' || trueRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 188
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate where DELETE lifecycle is implemented by searching for status transitions / queries
echo "---- rg for 'Finalize' / 'hard delete' ----"
rg -n --hidden --no-ignore-vcs "hard-?deleted|HardDeleted|Finalized|Finalize|force delete|force-delete" . -S -g'*.go' || true
echo "---- rg for 'deleted_time' exact column usage ----"
rg -n --hidden --no-ignore-vcs "deleted_time" . -S -g'*.go' -g'*.sql' -g'*.yaml' -g'*.yml' || true
echo "---- rg for 'generation' updates related to deletes ----"
rg -n --hidden --no-ignore-vcs "generation.*(deleted|delete|finaliz)" . -S -g'*.go' || true
echo "---- list likely DAO/service directories ----"
ls -la
find . -maxdepth 3 -type d \( -name 'dao' -o -name 'service*' -o -name 'pkg' \) 2>/dev/null | headRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 24932
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- docs/database.md (around lines 50-95) ----"
nl -ba docs/database.md | sed -n '50,95p'
echo "---- rg lifecycle terms in Go (case-sensitive) ----"
rg -n --hidden --no-ignore-vcs "force-delete|Finalizing|Finalized|deleted_time|deleted_by|generation" . -S -g'*.go' || true
echo "---- rg lifecycle terms in SQL/migrations/docs ----"
rg -n --hidden --no-ignore-vcs "force-delete|Finalizing|Finalized|deleted_time|deleted_by|generation" . -S -g'*.sql' -g'*.md' -g'*.yaml' -g'*.yml' -g'*.go' || true
echo "---- rg delete policy text ----"
rg -n --hidden --no-ignore-vcs "restrict|cascade|force-delete|POST /force-delete|/force-delete" . -S -g'*.go' -g'*.md' || trueRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 188
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- docs/database.md lines 55-90 (with pseudo line numbers) ----"
awk 'NR>=55 && NR<=90 {printf "%d\t%s\n", NR, $0}' docs/database.md
echo "---- rg restrict/cascade/force-delete implementation text in Go/handlers ----"
rg -n --hidden --no-ignore-vcs "(restrict|cascade|force-delete|force delete|enforced recursively|cycle)" . -S -g'*.go' -g'*.md'
echo "---- locate force-delete handlers ----"
rg -n --hidden --no-ignore-vcs "ForceDelete|/force-delete" pkg plugins -S -g'*.go'
echo "---- inspect likely delete policy code locations ----"
rg -n --hidden --no-ignore-vcs "delete policy|policy.*delete|DeletePolicy|cascade|restrict" pkg -S -g'*.go'
echo "---- transaction/locking primitives in relevant services/daos ----"
rg -n --hidden --no-ignore-vcs "(BeginTx|Begin\\(|Commit\\(|Rollback\\(|SELECT FOR UPDATE|FOR UPDATE|Lock\\(|Mutex|lock\\.)" pkg -S -g'*.go'
echo "---- show relevant sections in node_pool.go/cluster.go/aggregation.go/resource.go ----"
# Use sed to avoid huge output; extract around the comment blocks found earlier by rg.
sed -n '520,640p' pkg/services/node_pool.go
sed -n '420,560p' pkg/services/cluster.go
sed -n '220,420p' pkg/services/aggregation.go
sed -n '720,820p' pkg/services/aggregation.go
sed -n '120,200p' pkg/services/resource.goRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 50079
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show remainder of deleteResourceTree and checkCanDelete implementation
echo "---- pkg/services/resource.go (Delete + deleteResourceTree continuation) ----"
sed -n '120,260p' pkg/services/resource.go
echo "---- pkg/services/resource.go (checkCanDelete + helpers) ----"
rg -n "func \\(s \\*sqlResourceService\\) checkCanDelete|checkCanDelete\\(" -S pkg/services/resource.go
sed -n '260,430p' pkg/services/resource.go
# Show node_pool.go soft-delete (Delete) and where tryHardDeleteNodePool is called
echo "---- pkg/services/node_pool.go relevant sections ----"
rg -n "func \\(s \\*sqlNodePoolService\\) Delete|MarkDeleted|ForceDelete|tryHardDeleteNodePool|final" pkg/services/node_pool.go -S
sed -n '180,340p' pkg/services/node_pool.go
sed -n '340,620p' pkg/services/node_pool.go
# Show cluster.go soft-delete (Delete) and where tryHardDeleteCluster is called
echo "---- pkg/services/cluster.go relevant sections ----"
rg -n "func \\(s \\*sqlClusterService\\) Delete|MarkDeleted|ForceDelete|tryHardDeleteCluster|tryHardDelete" pkg/services/cluster.go -S
sed -n '90,240p' pkg/services/cluster.go
sed -n '240,560p' pkg/services/cluster.go
# Show registry.ChildrenOf to see if cycle detection exists
echo "---- pkg/registry - ChildrenOf implementation ----"
rg -n "func (ChildrenOf|ChildrenOf\\()" pkg/registry -S
rg -n "ChildrenOf\\(" pkg/registry -S
# Print descriptor registry parts
sed -n '1,220p' pkg/registry/registry.go 2>/dev/null || true
sed -n '1,220p' pkg/registry/descriptor.go
ls -ლა pkg/registry
# Show ForceDelete tests for idempotency / repeated calls
echo "---- ForceDelete tests (look for 'again' or repeated) ----"
rg -n "force-delete.*again|ForceDelete.*again|force delete.*twice|twice" test pkg/services/*_test.go -S || true
rg -n "force-delete" pkg/services/*_test.go test/integration/*_test.go -S | head -n 80Repository: openshift-hyperfleet/hyperfleet-api
Length of output: 44036
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n --hidden --no-ignore-vcs "(Finalizing state|deleted_time.*nil|DeletedTime == nil|deleted_time|DeletedTime != nil|in Finalizing)" pkg/handlers pkg/services -S -g'*.go'
rg -n --hidden --no-ignore-vcs "force-delete" test pkg/handlers pkg/services -S -g'*.go' | head -n 200
rg -n --hidden --no-ignore-vcs "ForceDelete.*(twice|again|already)" test -S -g'*.go'
rg -n --hidden --no-ignore-vcs "not in Finalizing state" pkg -S -g'*.go'Repository: openshift-hyperfleet/hyperfleet-api
Length of output: 5396
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- pkg/services/resource.go lines 1-120 (find parent.DeletedTime != nil) ----"
sed -n '1,140p' pkg/services/resource.go
echo "---- pkg/handlers create/patch blocks for finalizing (search 409 / Finalizing / deleted_time) ----"
rg -n --hidden --no-ignore-vcs "Finalizing|deleted_time|DeletedTime != nil|not in Finalizing state" pkg/handlers -S -g'*.go'
echo "---- find transaction middleware usage in handlers/services (BeginTx/TransactionMiddleware) ----"
rg -n --hidden --no-ignore-vcs "(transaction|TransactionMiddleware|WithTx|BeginTx|db\\.With|gorm\\.Begin|Begin\\()" pkg/handlers pkg/services pkg/db -S -g'*.go'
echo "---- pkg/services/resource.go checkCanDelete caller context ----"
rg -n --hidden --no-ignore-vcs "checkCanDelete\\(" -S pkg/services/resource.goRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 13265
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- Where TransactionMiddleware is applied ----"
rg -n --hidden --no-ignore-vcs "TransactionMiddleware\\(" pkg/handlers cmd plugins -S -g'*.go'
echo "---- cluster_nodepools create/patch checks for cluster.DeletedTime ----"
rg -n --hidden --no-ignore-vcs "cluster\\.DeletedTime != nil|DeletedTime != nil.*cluster|marked for deletion" pkg/handlers -S -g'*.go' || true
echo "---- find nodepool create handler and its conflict condition ----"
rg -n --hidden --no-ignore-vcs "NodePool.*Create|CreateNodePool|nodepools.*post" pkg/handlers -S -g'*.go' || true
echo "---- check generic child creation in resource.go already shown; now confirm it maps to 409 in handlers ----"
rg -n --hidden --no-ignore-vcs "marked for deletion" pkg/handlers -S -g'*.go' || true
echo "---- find force-delete idempotency tests (called twice) ----"
rg -n --hidden --no-ignore-vcs "force-delete.*returns|force-deleted.*again|ForceDelete.*again|already.*force-delete" test pkg -S -g'*.go' || trueRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 3563
Fix delete-lifecycle review: remove unsupported adapter-quorum race claims; add real gaps (policy recursion cycles, clarify locking/isolation)
- The Active → Finalizing → Hard-Deleted quorum is concurrency-guarded: adapter status processing serializes per resource via
GetForUpdate, and hard-delete only runs when adapters reportFinalized=Trueat the same generation (adapterStatus.ObservedGeneration == <resource>.Generation) intryHardDeleteNodePool/tryHardDeleteCluster(CWE-362). - Child creation under a finalizing parent is already enforced with 409s:
ResourceService.Createlocks the parent (GetForUpdate) and rejects whenparent.DeletedTime != nil, and the cluster→nodepools force-delete/create path similarly rejects whencluster.DeletedTime != nil(CWE-367). The docs don’t need extra TOCTOU language here. - The recursive delete policy section still needs a robustness note:
registry.Validatedoes not detect cyclicParentKindgraphs, so recursive deletion (deleteResourceTree/ChildrenOf) can recurse indefinitely if descriptors are mis-registered (CWE-674). Add cycle detection inregistry.Validateor guard recursion with a visited set. restrictenforcement (ExistsByOwnerincheckCanDelete) is not explicitly documented in terms of transaction/isolation guarantees; the docs should state that the parent delete path runs under a transaction (TransactionMiddleware) and relies on the parent row lock to avoid inconsistent decisions (CWE-367).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/database.md` around lines 62 - 73, Update the delete-lifecycle docs to
remove the unsupported "adapter quorum race" wording and instead state that
adapter finalization is concurrency-guarded: adapter status processing
serializes per-resource via GetForUpdate and hard-delete only proceeds when
adapters report Finalized=True at the matching generation (ObservedGeneration ==
resource.Generation) as checked in tryHardDeleteNodePool and
tryHardDeleteCluster; clarify that child-creation-under-finalizing-parent is
already prevented by ResourceService.Create which acquires the parent row lock
(GetForUpdate) and rejects when parent.DeletedTime != nil (the cluster→nodepools
force-delete/create path likewise rejects deleted parents); add a robustness
note to recursive delete policy that registry.Validate currently does not detect
ParentKind cycles so deleteResourceTree / ChildrenOf must be protected (either
add cycle detection in registry.Validate or guard recursion with a visited set
to avoid unbounded recursion); and explicitly state that restrict enforcement
(ExistsByOwner used in checkCanDelete) runs inside TransactionMiddleware and
relies on the parent row lock for correct isolation.
| The `migrate` and `serve` commands require a configuration file. The loader checks `--config` flag, then `HYPERFLEET_CONFIG` env var, then `/etc/hyperfleet/config.yaml`, then `./configs/config.yaml`. | ||
| If none are found, the command fails with `failed to load configuration`. Copy the example config or point to your own. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify config loader validates file permissions and paths
rg -n -C3 'HYPERFLEET_CONFIG|--config' --type=go --glob '!vendor/**' | head -50
# Check for path validation or permission checks
ast-grep --pattern $'func $_(path string) $_ {
$$$
os.Stat($$$)
$$$
}'Repository: openshift-hyperfleet/hyperfleet-api
Length of output: 3029
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Inspect config loader path resolution + any permission/ownership checks
echo "=== pkg/config/loader.go: resolveAndReadConfigFile ==="
rg -n "resolveAndReadConfigFile|HYPERFLEET_CONFIG|/etc/hyperfleet/config.yaml|configs/config.yaml|ReadInConfig|os\.Stat|UnmarshalExact" pkg/config/loader.go
echo
sed -n '1,220p' pkg/config/loader.go
# 2) Inspect config validation/unmarshal in loader.go (schema enforcement)
echo
echo "=== pkg/config/loader.go: validation calls ==="
rg -n "UnmarshalExact|validator\.Struct|Validate|identity|JWT|TLS|timeout" pkg/config/loader.go
# 3) Force-delete wiring: handler -> service -> auth/RBAC/owner scoping
echo
echo "=== ForceDelete handlers/services (ownership/RBAC) ==="
rg -n "ForceDelete|force-delete" cmd pkg/handlers pkg/services cmd/hyperfleet-api/server/routes.go
echo
# show key snippets if present
for f in cmd/hyperfleet-api/server/routes.go \
pkg/handlers/cluster.go \
pkg/handlers/cluster_nodepools.go \
pkg/services/cluster.go \
pkg/services/node_pool.go \
pkg/services/resource.go
do
if [ -f "$f" ]; then
echo
echo "=== $f (relevant lines) ==="
rg -n "ForceDelete|force-delete|JWTHandler|PublicPaths|CallerIdentityMiddleware|deleteResourceTree|restrict|cascade" "$f" || true
fi
done
# 4) deleteResourceTree recursion guards
echo
echo "=== pkg/services/resource.go: deleteResourceTree ==="
rg -n "func .*deleteResourceTree|deleteResourceTree|visited|depth|cycle|restrict|cascade" pkg/services/resource.go || true
if [ -f pkg/services/resource.go ]; then
sed -n '1,260p' pkg/services/resource.go
fiRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 30155
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Force-delete authN/authZ + middleware wiring
rg -n "PublicPaths|JWTHandler|CallerIdentityMiddleware|force-delete|force delete|ForceDelete" cmd/hyperfleet-api/server/routes.go
echo "----"
sed -n '1,220p' cmd/hyperfleet-api/server/routes.go
echo "----"
rg -n "handleForceDelete|ForceDeleteRequest|jwt|RBAC|authorize|permission|role" pkg/handlers/framework.go pkg/handlers/cluster.go pkg/handlers/cluster_nodepools.go
echo "----"
sed -n '120,230p' pkg/handlers/framework.go
echo "----"
sed -n '140,230p' pkg/handlers/cluster.go
echo "----"
sed -n '120,210p' pkg/handlers/cluster_nodepools.go
echo "----"
# Service-level checks for ForceDelete
rg -n "func \\(s \\*sqlClusterService\\) ForceDelete|func \\(s \\*sqlNodePoolService\\) ForceDelete|DeletedTime|owner|GetBy.*Owner|auth|RBAC|permission" pkg/services/cluster.go pkg/services/node_pool.go
echo "----"
sed -n '450,620p' pkg/services/cluster.go
echo "----"
sed -n '220,360p' pkg/services/node_pool.go
echo "----"
# deleteResourceTree recursion/cycle guards
rg -n "deleteResourceTree\\(|visited|depth|cycle|max depth" pkg/services/resource.go
echo "----"
sed -n '130,260p' pkg/services/resource.go
echo "----"
# Registry descriptor graph validation (cycles, depth limits)
rg -n "cycle|acyclic|topolog|visited|graph|ChildrenOf\\(" pkg/registry
fd . pkg/registry -t f | head -50
for f in $(fd -t f pkg/registry | tr '\n' ' '); do
# keep output small by only showing likely validation locations
rg -n "cycle|acyclic|topolog|graph|ChildrenOf\\(" "$f" && echo "---- in $f ----" && break
doneRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 28168
README.md: Document the file-path trust boundary for --config / HYPERFLEET_CONFIG
README.md lines 92-93: --config and HYPERFLEET_CONFIG are used as the config file path (only an existence check via os.Stat for explicit sources, then viper.ReadInConfig), with no allowlist/permission/ownership validation—external path control is a CWE-73 risk (path-controlled file read). Document that the path must be operator-trusted and specify expected file permissions/ownership.
Also adjust the wording to reflect that config content is validated in code (viper.UnmarshalExact + validator.Struct + custom TLS/JWT/timeouts/identity checks); the missing gap is path/permission safety, not schema/content enforcement.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@README.md` around lines 92 - 93, Update README.md to document the trust
boundary for config file paths used by the loader (the --config flag and
HYPERFLEET_CONFIG env var) and explicitly state that these paths must be
operator-trusted; note that the loader only performs an existence check
(os.Stat) for explicit sources and then reads via viper.ReadInConfig, so callers
must ensure the file path, permissions and ownership are secure (suggested
mode/owner and disallow world-writable access). Also change the wording to
clarify that schema/content validation is performed in code
(viper.UnmarshalExact + validator.Struct plus custom TLS/JWT/timeouts/identity
checks), and that the documented gap is path/permission safety rather than
configuration content validation.
| - `GET/POST /api/hyperfleet/v1/clusters` | ||
| - `GET /api/hyperfleet/v1/clusters/{id}` | ||
| - `GET/PATCH/DELETE /api/hyperfleet/v1/clusters/{id}` | ||
| - `POST /api/hyperfleet/v1/clusters/{id}/force-delete` |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if force-delete handler enforces stricter authz than regular DELETE
rg -n -C5 'force-delete|ForceDelete' --type=go pkg/handlers/ | head -80Repository: openshift-hyperfleet/hyperfleet-api
Length of output: 4993
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Locate routes/endpoints mentioning force-delete and where authz/middleware is wired.
rg -n --hidden --no-ignore-vcs "force-delete" . || true
rg -n --hidden --no-ignore-vcs "ForceDelete" . || true
# 2) Inspect surrounding handler code for ForceDelete and any normal DELETE methods.
echo "---- cluster.go around ForceDelete and any Delete handlers ----"
sed -n '130,240p' pkg/handlers/cluster.go || true
echo "---- cluster_nodepools.go around ForceDelete and any Delete handlers ----"
sed -n '1,260p' pkg/handlers/cluster_nodepools.go || true
# 3) Inspect framework handler for handleForceDelete / authz-related hooks.
sed -n '1,240p' pkg/handlers/framework.go || true
# 4) README snippet around line ~122 (as referenced by review comment)
ls -la || true
sed -n '90,150p' README.md || trueRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 30129
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Inspect plugin/router wiring around force-delete handlers.
sed -n '1,140p' plugins/clusters/plugin.go
# 2) Find router/middleware/auth wiring keywords.
rg -n --hidden --no-ignore-vcs "middleware|auth|RBAC|Role|scope|Authorize|authorization|Permission|entitle|claims|user|principal" pkg cmd plugins charts configs serve . || true
# 3) Check how handlers get context: ownership/user extracted somewhere.
rg -n --hidden --no-ignore-vcs "GetByIDAndOwner|owner|tenant|project|account|userID|principal|subject" pkg | head -200 || true
# 4) Inspect cluster/node_pool service ForceDelete implementations for any explicit authz checks.
sed -n '450,660p' pkg/services/cluster.go
sed -n '220,380p' pkg/services/node_pool.go
# 5) Inspect regular DELETE handlers to compare logic surface.
# cluster DELETE handler
sed -n '120,220p' pkg/handlers/cluster.go
# nodepool DELETE handler
sed -n '80,200p' pkg/handlers/cluster_nodepools.goRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 50392
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- pkg/services/cluster.go: ForceDelete + any SoftDelete/Delete around it ----"
rg -n "func \\(s \\*sqlClusterService\\) (ForceDelete|SoftDelete|Delete)" pkg/services/cluster.go
sed -n '430,620p' pkg/services/cluster.go
echo "---- pkg/services/node_pool.go: ForceDelete + any SoftDelete/Delete around it ----"
rg -n "func \\(s \\*sqlNodePoolService\\) (ForceDelete|SoftDelete|Delete)" pkg/services/node_pool.go
sed -n '180,360p' pkg/services/node_pool.go
echo "---- pkg/dao: ForceDelete-related queries in cluster/node_pool ----"
rg -n "ForceDelete" pkg/dao || true
rg -n "SoftDelete|Soft delete|deleted_time" pkg/dao/cluster*.go pkg/dao/*node*pool*.go || true
echo "---- handler delete endpoints (for comparison): pkg/handlers/cluster.go SoftDelete/DELETE semantics ----"
rg -n "func \\(h ClusterHandler\\) (SoftDelete|Delete|ForceDelete)" pkg/handlers/cluster.go
sed -n '120,220p' pkg/handlers/cluster.go
echo "---- handler delete endpoints (for comparison): pkg/handlers/cluster_nodepools.go SoftDelete/ForceDelete semantics ----"
rg -n "func \\(h ClusterNodePoolsHandler\\) (SoftDelete|ForceDelete)" pkg/handlers/cluster_nodepools.go
sed -n '80,200p' pkg/handlers/cluster_nodepools.go
echo "---- cmd/hyperfleet-api/server: middleware/router auth wiring (targeted) ----"
rg -n "Use\\(|middleware|auth\\.|authz|RBAC|authorize|permission|JWT|Identity" cmd/hyperfleet-api/server cmd/hyperfleet-api/environments cmd/hyperfleet-api/*.go || trueRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 22615
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Cluster service: SoftDelete implementation details
echo "---- pkg/services/cluster.go: SoftDelete (and related delete methods) ----"
sed -n '120,260p' pkg/services/cluster.go
sed -n '260,420p' pkg/services/cluster.go
sed -n '420,560p' pkg/services/cluster.go
# Node pool service: see SoftDelete/ Delete / ForceDelete call patterns re owner checks
echo "---- pkg/services/node_pool.go: SoftDelete and Delete ----"
sed -n '150,280p' pkg/services/node_pool.go
sed -n '280,360p' pkg/services/node_pool.go
# DAO cluster: see whether GetForUpdate or SoftDelete uses caller/owner scoping
echo "---- pkg/dao/cluster*.go: search for GetForUpdate / SoftDelete ownership ----"
rg -n "type .*ClusterDao|GetForUpdate|SoftDelete|DeletedTime|ByOwner|Owner" pkg/dao --glob '*cluster*' || true
rg -n "GetForUpdate\\(|SoftDelete\\(|Delete\\(" pkg/dao --glob '*cluster*' || true
# Show cluster dao implementation for GetForUpdate and any owner-filtered queries
# (we don't know filename; use fd-like search with rg results)
echo "---- showing matching cluster dao files with GetForUpdate ----"
rg -n "GetForUpdate\\(" pkg/dao --glob '*cluster*' --files-with-matches || true
# Print relevant sections from all files that define GetForUpdate in cluster dao (limited)
for f in $(rg -n "func \\(.*\\) GetForUpdate\\(" pkg/dao --glob '*cluster*' --files-with-matches); do
echo "==== $f ===="
sed -n '1,220p' "$f"
sed -n '220,420p' "$f"
done || true
# Node pool DAO: GetByIDAndOwner scoping
echo "---- pkg/dao/node*pool*.go: GetByIDAndOwner ----"
rg -n "GetByIDAndOwner\\(" pkg/dao --glob '*node*pool*' || true
for f in $(rg -n "GetByIDAndOwner\\(" pkg/dao --glob '*node*pool*' --files-with-matches || true); do
echo "==== $f ===="
sed -n '1,260p' "$f"
done
# Handlers/tests: look for authz expectations for force-delete
echo "---- pkg/handlers/cluster_test.go: search for Unauthorized/Forbidden around ForceDelete ----"
rg -n "ForceDelete|force-delete|401|403|Unauthorized|Forbidden" pkg/handlers/cluster_test.go pkg/handlers/cluster_nodepools_test.go | head -200 || true
echo "---- pkg/handlers/cluster_nodepools_test.go: search for authz around ForceDelete ----"
rg -n "Unauthorized|Forbidden|401|403" pkg/handlers/cluster_nodepools_test.go pkg/handlers/cluster_test.go | head -200 || true
echo "---- integration tests: search for force-delete authz expectations ----"
rg -n "force-delete.*(401|403|Unauthorized|Forbidden)" test/integration -S --glob '*force*' || true
rg -n "Unauthorized|Forbidden|401|403" test/integration/clusters_test.go test/integration/node_pools_test.go | head -200 || trueRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 38188
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- README.md: search for force-delete auth/RBAC/runbook text ----"
rg -n "force-delete|ForceDelete|RBAC|role|permission|authorize|authorization|runbook|safety|admin" README.md docs/api-resources.md docs/runbook.md docs/database.md || true
echo "---- README.md around line ~100-150 (includes force-delete endpoint list) ----"
sed -n '90,160p' README.md || true
echo "---- docs/api-resources.md: force-delete cluster section ----"
rg -n "POST /api/hyperfleet/v1/clusters/\\{.*\\}/force-delete|force-delete cluster|Force Delete Cluster|Force Delete" docs/api-resources.md || true
# show likely relevant sections by line ranges
sed -n '340,420p' docs/api-resources.md || true
echo "---- docs/api-resources.md: force-delete nodepool section ----"
sed -n '520,600p' docs/api-resources.md || true
echo "---- cmd/hyperfleet-api/server/routes.go: full to see if authz is wired ----"
sed -n '1,200p' cmd/hyperfleet-api/server/routes.go || true
sed -n '200,420p' cmd/hyperfleet-api/server/routes.go || true
echo "---- search cmd/hyperfleet-api/server for authorize/authz/RBAC names ----"
rg -n --no-ignore-vcs "authorize|authorization|authz|RBAC|role|permission|Scope" cmd/hyperfleet-api/server pkg/middleware pkg/auth || trueRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 16418
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "---- Find Cluster model struct in pkg/api ----"
rg -n "type Cluster struct" pkg/api || true
# show likely file
f=$(rg -n "type Cluster struct" pkg/api --files-with-matches | head -n 1 || true)
echo "cluster struct file: $f"
if [ -n "${f:-}" ]; then
sed -n '1,220p' "$f"
fi
echo "---- Search for owner_id / OwnerID in Cluster struct or fields ----"
rg -n "OwnerID|owner_id|owner" "$f" || true
echo "---- Confirm cluster handler SoftDelete (regular DELETE) calls service without ownership scoping ----"
sed -n '130,220p' pkg/handlers/cluster.go
echo "---- docs/api-resources.md force-delete cluster section for any RBAC/access control text ----"
# narrow by lines around "Force Delete Cluster"
rg -n "### Force Delete Cluster|Force Delete Cluster|force-delete cluster" docs/api-resources.md
# print a chunk around those lines
sed -n '350,410p' docs/api-resources.md
echo "---- docs/api-resources.md force-delete nodepool section for RBAC/access control text ----"
sed -n '540,590p' docs/api-resources.mdRepository: openshift-hyperfleet/hyperfleet-api
Length of output: 7568
Enforce authorization for force-delete endpoints (CWE-285).
- Cluster
POST /api/hyperfleet/v1/clusters/{id}/force-deletehas no caller authorization scoping:pkg/handlers/cluster.goforwards directly topkg/services/cluster.gowhereclusterDao.GetForUpdateselects byid = ?and then hard-deletes cluster + adapter statuses + child nodepools. - NodePool
POST /api/hyperfleet/v1/clusters/{cluster_id}/nodepools/{nodepool_id}/force-deleteonly scopes by path relationship (GetByIDAndOwnerfiltersid = ? AND owner_id = ?withowner_id= cluster_id), not by caller identity/permissions. - Docs describe the operational safety conditions for force-delete, but do not document any RBAC/permission requirements for invoking it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@README.md` at line 122, The force-delete endpoints lack caller authorization
checks; update the handlers and service layer to enforce RBAC by validating the
calling principal before performing destructive operations: in
pkg/handlers/cluster.go add an authorization check that extracts caller
identity/roles from the request and denies access if not permitted, then pass an
authorized context/principal into pkg/services/cluster.go; in
pkg/services/cluster.go verify the principal's permission prior to calling
clusterDao.GetForUpdate and before deleting adapter statuses/child nodepools;
also tighten NodePool logic where GetByIDAndOwner is used to ensure it validates
caller identity/permissions (not just path ownership) and finally update
README.md to document the required RBAC/permission prerequisites for invoking
the force-delete endpoints.
There was a problem hiding this comment.
We're just listing the endpoints in the README.md. I think we can talk about authorization verification in a later section, with a reminder in api-resources for example, pointing to the authentication.md ?
There was a problem hiding this comment.
@sherine-k — Agreed, the endpoint listing section isn't the place for auth docs. A note in docs/api-resources.md under the Force Delete sections (cluster + nodepool) pointing to docs/authentication.md for access-control requirements is the right structure.
That said, the documentation concern is separate from the underlying code issue (CWE-285): pkg/handlers/cluster.go and pkg/handlers/cluster_nodepools.go currently forward ForceDelete calls straight to the service/DAO layer with no caller-identity authorization check — only path-parameter ownership (id / owner_id) is validated. That's a live security gap independent of what the docs say.
Would you like me to open a follow-up GitHub issue to track the missing authorization enforcement in the force-delete handlers?
|
/test lint |
| listArgs := &ListArguments{ | ||
| Page: 1, | ||
| Size: 100, | ||
| Size: 20, |
There was a problem hiding this comment.
Warning
Blocking
Category: Standards
The PR title uses docs: but this file (and resource.go, types_test.go) contains a functional code change — reducing the default pageSize from 100 to 20. Per the HyperFleet commit standard, all changes must be consistent with the declared type.
This is also a breaking behavioral change for API consumers who rely on the default page size. Suggest:
- Split into a separate commit with
fix:type (e.g.,HYPERFLEET-1186 - fix: reduce default pageSize from 100 to 20) - Call out the behavior change explicitly in the PR description so reviewers and consumers are aware
There was a problem hiding this comment.
Hi @rafabene
Thanks for the review!:+1:
This is already in a separate commit : d2d1e75
I will update the PR title too. This is to align what the api spec sets, which takes precedence over the defaults defined, which sounded a bit high (100) to @rh-amarin and I.
|
|
||
| ### node_pools | ||
| Child resources owned by clusters, representing groups of compute nodes. Uses foreign key relationship with cascade delete. | ||
| Child resources owned by clusters, representing groups of compute nodes. References clusters via `owner_id` with a `RESTRICT` foreign key. Same column layout as clusters (including `labels`, `status_conditions`, `deleted_time`, `deleted_by`). |
There was a problem hiding this comment.
Tip
nit — non-blocking suggestion
Category: Inconsistency
"Same column layout as clusters" is slightly misleading — node_pools has owner_id (FK to clusters) which clusters doesn't. Consider rewording to clarify which columns are shared vs. unique:
| Child resources owned by clusters, representing groups of compute nodes. References clusters via `owner_id` with a `RESTRICT` foreign key. Same column layout as clusters (including `labels`, `status_conditions`, `deleted_time`, `deleted_by`). | |
| Child resources owned by clusters, representing groups of compute nodes. References clusters via `owner_id` with a `RESTRICT` foreign key. Shares the same core columns as clusters (`labels`, `status_conditions`, `deleted_time`, `deleted_by`) plus `owner_id` for the parent relationship. |
| listArgs := &ListArguments{ | ||
| Page: 1, | ||
| Size: 100, | ||
| Size: 20, |
There was a problem hiding this comment.
Tip
nit — non-blocking suggestion
Category: Pattern
The literal 20 is repeated in three places (types.go:28, resource.go:275, resource.go:303). Since there's already a MaxListSize constant at line 22, consider adding a companion:
| Size: 20, | |
| Size: DefaultPageSize, |
with const DefaultPageSize = 20 defined alongside MaxListSize on line 22. This keeps future page-size changes atomic — the PR itself had to update 3 sites to change this value.
|
|
||
| ```json | ||
| { | ||
| "type": "about:blank", |
There was a problem hiding this comment.
Warning
Blocking
Category: Standards
All three error examples use "type": "about:blank", but the code (pkg/errors/errors.go) defines specific type URIs:
ErrorTypeBase = "https://api.hyperfleet.io/errors/"
ErrorTypeValidation = ErrorTypeBase + "validation-error"
ErrorTypeNotFound = ErrorTypeBase + "not-found"
ErrorTypeConflict = ErrorTypeBase + "conflict"The Error Model Standard also mandates these URIs. Please update:
- Line 780:
"about:blank"→"https://api.hyperfleet.io/errors/validation-error" - Line 808:
"about:blank"→"https://api.hyperfleet.io/errors/not-found" - Line 821:
"about:blank"→"https://api.hyperfleet.io/errors/conflict"
| | `type` | string | Yes | URI reference identifying the problem type | | ||
| | `title` | string | Yes | Short human-readable summary | | ||
| | `status` | integer | Yes | HTTP status code | | ||
| | `detail` | string | No | Human-readable explanation specific to this occurrence | |
There was a problem hiding this comment.
Warning
Blocking
Category: Standards
detail is marked "No" in the "Always present" column, but:
- The code (
pkg/errors/errors.goAsProblemDetails()) always populatesDetail: &e.Reason - The Error Model Standard marks
detailas Required - All three examples in this very section include
detail
| | `detail` | string | No | Human-readable explanation specific to this occurrence | | |
| | `detail` | string | Yes | Human-readable explanation specific to this occurrence | |
Summary
This PR reviews and re-aligns the docs in anticipation of v1.0.0 release.
It also fixes the pageSize default to 20, aligning it with the default values in api specs.
Test Plan
make test-allpassesmake lintpassesmake test-helm(if applicable)